Goto

Collaborating Authors

 Norwalk


Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall

Yuan, Jiaqing, Pan, Lin, Hang, Chung-Wei, Guo, Jiang, Jiang, Jiarong, Min, Bonan, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.


Tesla on autopilot smacked into Florida Highway Patrol cruiser that stopped to help disabled vehicle

Daily Mail - Science & tech

A Tesla Model 3 driving on'autopilot' smacked into a Florida Highway Patrol cruiser on Saturday morning, narrowly missing the driver of the cruiser who had stopped in order to help a disabled vehicle. The incident is the 12th such smash involving a Tesla on autopilot mode and an emergency vehicle. All the cars which have been struck had their lights flashing, or had deployed an emergency flare, illuminated warning sign or cones, raising questions about whether they may have confused the Tesla's sensors. Saturday's smash happened after when the 28-year-old trooper, who has not been named, stopped shortly after 5 am on August 28 on I-4 near downtown Orlando while responding to a broken down car. He put his emergency lights and was walking over to a disabled vehicle when the Tesla hit the cruiser's left side, according to a copy of the police report seen by DailyMail.com.


U.S. Opens Investigation Into Tesla's Autopilot Driving System

TIME - Tech

The U.S. government has opened a formal investigation into Tesla's Autopilot partially automated driving system after a series of collisions with parked emergency vehicles. The investigation covers 765,000 vehicles, almost everything that Tesla has sold in the U.S. since the start of the 2014 model year. Of the crashes identified by the National Highway Traffic Safety Administration as part of the probe, 17 people were injured and one was killed. NHTSA says it has identified 11 crashes since 2018 in which Teslas on Autopilot or Traffic Aware Cruise Control have hit vehicles at scenes where first responders have used flashing lights, flares, an illuminated arrow board or cones warning of hazards. The agency announced the action Monday in a posting on its website.


Tesla's Autopilot faces US investigation after crashes with emergency vehicles

The Guardian

The US government has opened a formal investigation into Tesla's Autopilot partially automated driving system after a series of collisions with parked emergency vehicles. The investigation covers 765,000 vehicles, almost everything that Tesla has sold in the US since the start of the 2014 model year. Of the crashes identified by the National Highway Traffic Safety Administration (NHTSA) as part of the investigation, 17 people were injured and one was killed. NHTSA says it has identified 11 crashes since 2018 in which Teslas on Autopilot or Traffic Aware Cruise Control have hit vehicles at scenes where first responders used flashing lights, flares, an illuminated arrow board or cones warning of hazards. The agency announced the action on Monday in a posting on its website.


Real-life RoboCop was at the scene of a crime. Then it moved on.

#artificialintelligence

When a fight broke out recently in the parking lot of Salt Lake Park, a few miles south of downtown Los Angeles, Cogo Guebara did what seemed the most practical thing at the time: she ran over to the park's police robot to push its emergency alert button. "I was pushing the button but it said, 'step out of the way,'" Guebara said. "It just kept ringing and ringing, and I kept pushing and pushing." She thought maybe the robot, which stands about 5 feet tall and has "POLICE" emblazoned on its egg-shaped body, wanted a visual of her face, so she crouched down for the camera. Without a response, Rudy Espericuta, who was with Guebara and her children at the time, dialed 911.